PineSAP—sequence alignment and SNP identification pipeline
نویسندگان
چکیده
UNLABELLED The Pine Alignment and SNP Identification Pipeline (PineSAP) provides a high-throughput solution to single nucleotide polymorphism (SNP) prediction using multiple sequence alignments from re-sequencing data. This pipeline integrates a hybrid of customized scripting, existing utilities and machine learning in order to increase the speed and accuracy of SNP calls. The implementation of this pipeline results in significantly improved multiple sequence alignments and SNP identifications when compared with existing solutions. The use of machine learning in the SNP identifications extends the pipeline's application to any eukaryotic species where full genome sequence information is unavailable. AVAILABILITY All code used for this pipeline is freely available at the Dendrome project website (http://dendrome.ucdavis.edu/adept2/resequencing.html)
منابع مشابه
CFSAN SNP Pipeline: an automated method for constructing SNP matrices from next-generation sequence data
The analysis of next-generation sequence (NGS) data is often a fragmented step-wise process. For example, multiple pieces of software are typically needed to map NGS reads, extract variant sites, and construct a DNA sequence matrix containing only single nucleotide polymorphisms (i.e., a SNP matrix) for a set of individuals. The management and chaining of these software pieces and their outputs...
متن کاملRTPrimerDB: the real-time PCR primer and probe database, major update 2006
The RTPrimerDB (http://medgen.ugent.be/rtprimerdb) project provides a freely accessible data retrieval system and an in silico assay evaluation pipeline for real-time quantitative PCR assays. Over the last year the number of user submitted assays has grown to 3500. Data conveyance from Entrez Gene by establishing an assay-to-gene relationship enables the addition of new primer assays for one of...
متن کاملAn Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data
Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline ha...
متن کاملSIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data
In recent studies, exome sequencing has proven to be a successful screening tool for the identification of candidate genes causing rare genetic diseases. Although underlying targeted sequencing methods are well established, necessary data handling and focused, structured analysis still remain demanding tasks. Here, we present a cloud-enabled autonomous analysis pipeline, which comprises the com...
متن کاملEngineering a high-performance SNP detection pipeline
We present Sprite, a bioinformatic data analysis pipeline for detecting single nucleotide polymorphisms (SNPs) in the human genome. A SNP detection pipeline for next-generation sequencing data uses several software tools, including tools for read preprocessing, read alignment, and SNP calling. We target end-to-end scalability and I/O efficiency in Sprite by merging tools in this pipeline and el...
متن کامل